Lecture 2: The Garden of Forking Data

Globe Tossing DAG

The globe tossing example from chapter two presents an interesting example of causal influence.

Show code
library(ggdag)

dag_demo <-
  dagify(
    W ~ N + p,
    L ~ N + p,
    coords = time_ordered_coords()
  )

ggdag(dag_demo) + theme_dag()

Here, N represents the number of times a globe is tossed, with W being the number of times the globe is caught with your finger on water, and L being the catches on land. p represents the proportion of the earth’s surface covered by water.

The DAG says the “the proportion of the earth covered by water and the number of tosses influence the both number of water catches and the number of land catches,” an obvious and profound result.

Bayesian Data Analysis

For each possible explanation of the sample,
Count all the ways the sample could happen.
Explanations with more ways to produce the sample are more plausible.

Statistical Rethinking 2023 - 02 - The Garden of Forking Data (9:22)

Bayesian Learning

Bayesian models learn and adjust their estimates when reading new data:

Show code
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st

x = np.linspace(0, 1)
fig, axes = plt.subplots(3, 3, figsize=(10, 10))
axes = axes.flatten()

# 1 = Water, 0 = Land
tosses = [1, 0, 0, 1, 0, 1, 1, 1, 1]
label = ""

# Calculate the maximum y-value across all distributions
max_y = 0
for i in range(1, len(tosses) + 1):
    W = sum(tosses[0:i])
    L = i - W
    y_vals = st.beta.pdf(x, W + 1, L + 1)
    max_y = max(max_y, np.max(y_vals))

# Add a small margin to the max_y for plotting
max_y *= 1.1

# Reset label for plotting
label = ""

# each iteration represents a learning iteration
for i in range(1, len(tosses) + 1):
    if tosses[i - 1] == 0:
        label += "L"
    else:
        label += "W"
    W = sum(tosses[0:i])
    L = i - W
    axes[i - 1].axvline(x=0.71, color="red", linestyle="dashed");
    axes[i - 1].plot(x, st.beta.pdf(x, W + 1, L + 1), "b-");
    axes[i - 1].set_title(label);
    axes[i - 1].set_xlabel("p");
    axes[i - 1].set_ylabel("Density");
    axes[i - 1].set_ylim(0, max_y);

plt.tight_layout()
plt.show()

Sampling from the posterior

Instead of the model “learning” the posterior, you can sample from the posterior directly:

Show code
from IPython.display import HTML
import matplotlib.animation as animation

fig, ax = plt.subplots(1, 3, figsize=(10.5, 3.5))
post_preds = []


def hmcmc(frame, W=6, L=3):
    # Clear axes before replotting
    for ax_ in ax:
        ax_.clear()

    # Sample from Posterior
    sample = st.beta.rvs(W + 1, L + 1)
    post_preds.append(sample * (W + L))

    # Plot 1: The Beta PDF + Current Sample
    ax[0].plot(x, st.beta.pdf(x, W + 1, L + 1), "b-")
    ax[0].axvline(x=sample, color="red", linestyle="dashed", alpha=0.6)
    ax[0].set_title(f"Analytical PDF (Sample: {sample:.2f})")

    # Plot 2: Predictive Distribution (Binomial)
    pred_samples = st.binom.rvs(n=W + L, p=sample, size=1000)
    ax[1].hist(
        pred_samples,
        bins=np.arange(W + L + 2) - 0.5,
        color="skyblue",
        edgecolor="black",
    )
    ax[1].set_title("Predictive Distribution")
    ax[1].set_ylim(0, 350)

    # Plot 3: Posterior Predictive
    ax[2].hist(
        post_preds,
        bins=np.arange(W + L + 2) - 0.5,
        color="orange",
        edgecolor="black",
    )
    ax[2].set_title(f"Posterior Predictive (N={len(post_preds)})")
    ax[2].set_ylim(0, 45)


ani = animation.FuncAnimation(fig=fig, func=hmcmc, frames=100, interval=50)
plt.close()
HTML(ani.to_jshtml())

Misclassification

Show code
set.seed(13)
dag <-
  dagify(
    W ~ p + N,
    W_ ~ W + M,
    latent = c("p"),
    labels = c(
      p = " "
    )
  )

ggdag(dag, use_labels = "label") + ggdag::theme_dag()

We can model M, our measurement error. In lecture 2, this was modeled as the acceptance criteria for the sampler.

Show code
W = 6
L = 3
N = 1000
p = np.repeat(0.0, N)
p[0] = 0.5
p_unadjusted = p.copy()
q0 = st.binom.pmf(W, W + L, p[0])
for i in range(1, N):
    # Propose a value
    p_new = st.norm.rvs(p[i - 1], 0.1)

    # Enforce [0, 1]
    while p_new < 0:
        p_new = np.abs(p_new)
        while p_new > 1:
            p_new = 2 - p_new

    # Keep unadjusted sample
    p_unadjusted[i] = p_new

    # Simulate unobserved misclassification rate
    q1 = st.binom.pmf(W, W + L, p_new)
    if st.uniform.rvs(0, 1) < q1 / q0:
        # Accept
        p[i] = p_new
        q0 = q1
    else:
        # Reject
        p[i] = p[i - 1]

import arviz as az

x = np.linspace(0, 1)
fig, ax = plt.subplots(1, 1)
ax2 = ax.twinx()

ax2.plot(x, st.beta.pdf(x, W + 1, L + 1), "r-", label="Analytical posterior")

az.plot_kde(p, label="Metropolis approximation")
az.plot_kde(
    p_unadjusted,
    label="Metropolis approximation (unadjusted)",
    plot_kwargs={"color": "purple"},
)

fig.tight_layout()
ax.legend()
plt.show()

This plot compares the density estimates for the unadjusted posterior, alongside the “misclassification”-adjusted posterior. Notice the increased resemblance to the analytical posterior and the increased steepness of the adjusted posterior - more precision around the parameter!